AITopics | separate training

Collaborating Authors

separate training

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Slicing Vision Transformer for Flexible Inference

Neural Information Processing SystemsOct-10-2025, 01:37:58 GMT

Vision Transformers (ViT) is known for its scalability.

scala, separate training, vit, (15 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

Asymmetric Conflict and Synergy in Post-training for LLM-based Multilingual Machine Translation

Zheng, Tong, Wen, Yan, Bao, Huiwen, Guo, Junfeng, Huang, Heng

arXiv.org Artificial IntelligenceFeb-16-2025

The emergence of Large Language Models (LLMs) has advanced the multilingual machine translation (MMT), yet the Curse of Multilinguality (CoM) remains a major challenge. Existing work in LLM-based MMT typically mitigates this issue via scaling up training and computation budget, which raises a critical question: Is scaling up the training and computation budget truly necessary for high-quality MMT, or can a deeper understanding of CoM provide a more efficient solution? To explore this problem, we analyze the linguistic conflicts and synergy, the underlying mechanism of CoM during post-training phase. We identify an asymmetric phenomenon in linguistic conflicts and synergy: the dominance of conflicts and synergy varies in different translation directions, leading to sub-optimal adaptation in existing post-training methods. We further find that a significant bottleneck in MMT appears to lie in post-training rather than multilingual pre-training, suggesting the need for more effective adaptation strategies. Building on these new insights, we propose a direction-aware training approach, combined with group-wise model merging, to address asymmetry in linguistic conflicts and synergy explicitly. Leveraging this strategy, our method fine-tunes X-ALMA-13B-Pretrain-trained only with multilingual pre-training-achieving comparable performance to XALMA-13B (only SFT) while using only 20B pretraining tokens and 17B parameters-5.5x fewer pretraining-tokens and 1.7x fewer model size-with just 0.85 COMET drop on Flores-200 testsets of 50 languages.

large language model, multilingual training, natural language, (12 more...)

arXiv.org Artificial Intelligence

2502.11223

Country:

Asia (0.93)
Europe (0.92)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Slicing Vision Transformer for Flexible Inference

Zhang, Yitian, Coskun, Huseyin, Ma, Xu, Wang, Huan, Ma, Ke, Xi, null, Chen, null, Hu, Derek Hao, Fu, Yun

arXiv.org Artificial IntelligenceDec-6-2024

Vision Transformers (ViT) is known for its scalability. In this work, we target to scale down a ViT to fit in an environment with dynamic-changing resource constraints. We observe that smaller ViTs are intrinsically the sub-networks of a larger ViT with different widths. Thus, we propose a general framework, named Scala, to enable a single network to represent multiple smaller ViTs with flexible inference capability, which aligns with the inherent design of ViT to vary from widths. Concretely, Scala activates several subnets during training, introduces Isolated Activation to disentangle the smallest sub-network from other subnets, and leverages Scale Coordination to ensure each sub-network receives simplified, steady, and accurate learning objectives. Comprehensive empirical validations on different tasks demonstrate that with only one-shot training, Scala learns slimmable representation without modifying the original ViT structure and matches the performance of Separate Training. Compared with the prior art, Scala achieves an average improvement of 1.6% on ImageNet-1K with fewer parameters. Code is available at here.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.04786

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

Employing Layerwised Unsupervised Learning to Lessen Data and Loss Requirements in Forward-Forward Algorithms

Hwang, Taewook, Seo, Hyein, Jung, Sangkeun

arXiv.org Artificial IntelligenceApr-22-2024

Recent deep learning models such as ChatGPT utilizing the back-propagation algorithm have exhibited remarkable performance. However, the disparity between the biological brain processes and the back-propagation algorithm has been noted. The Forward-Forward algorithm, which trains deep learning models solely through the forward pass, has emerged to address this. Although the Forward-Forward algorithm cannot replace back-propagation due to limitations such as having to use special input and loss functions, it has the potential to be useful in special situations where back-propagation is difficult to use. To work around this limitation and verify usability, we propose an Unsupervised Forward-Forward algorithm. Using an unsupervised learning model enables training with usual loss functions and inputs without restriction. Through this approach, we lead to stable learning and enable versatile utilization across various datasets and tasks. From a usability perspective, given the characteristics of the Forward-Forward algorithm and the advantages of the proposed method, we anticipate its practical application even in scenarios such as federated learning, where deep learning layers need to be trained separately in physically distributed environments.

artificial intelligence, learning model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2404.14664

Genre: Research Report > New Finding (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Austin city agency offers racially segregated 'anti-racist' trainings for 'white folks' and 'people of color'

FOX NewsJan-30-2024, 01:10:27 GMT

Fox News host Greg Gutfeld goes over this weeks leftovers and Gutfeld! reacts to the resurfacing of an old training video on DEI by former Navy DEI director Dr. Charles Chuck Barber. A city agency in Austin, Texas invited employees to racially segregated "anti-racist" meetings where "white folks" were asked not to attend a meeting that was only for "people of color." A January email obtained by Fox News Digital reveals Austin's Parks & Recreation Department's equity and inclusion coordinator invited employees to attend "Antiracist Affinity Spaces," consisting of two separate trainings segregated by race as part of an "Equity and Inclusion program." "For People of Color*: Once a month, PARD employees of color will meet up at various city sites," the email says. "The first 1.5 hours will be for fostering dialogue and the last 30 minutes will be for networking. This monthly space will offer folks the opportunities to gather and connect with other PARD employees of color, share about our personal and professional experiences with racism, and learn about mentoring and job opportunities for professional development."

email, fox new digital, white folk, (13 more...)

FOX News

Country:

North America > United States > Texas > Travis County > Austin (0.27)
North America > United States > Oregon (0.05)
North America > United States > Missouri > Oregon County (0.05)
(4 more...)

Industry: Law > Civil Rights & Constitutional Law (1.00)

Technology: Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.40)

Add feedback

High Dimensional Causal Inference with Variational Backdoor Adjustment

Israel, Daniel, Grover, Aditya, Broeck, Guy Van den

arXiv.org Machine LearningOct-9-2023

Backdoor adjustment is a technique in causal inference for estimating interventional quantities from purely observational data. For example, in medical settings, backdoor adjustment can be used to control for confounding and estimate the effectiveness of a treatment. However, high dimensional treatments and confounders pose a series of potential pitfalls: tractability, identifiability, optimization. In this work, we take a generative modeling approach to backdoor adjustment for high dimensional treatments and confounders. We cast backdoor adjustment as an optimization problem in variational inference without reliance on proxy variables and hidden confounders. Empirically, our method is able to estimate interventional likelihood in a variety of high dimensional settings, including semi-synthetic X-ray medical data. To the best of our knowledge, this is the first application of backdoor adjustment in which all the relevant variables are high dimensional.

artificial intelligence, backdoor adjustment, machine learning, (19 more...)

arXiv.org Machine Learning

2310.061

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Israel (0.04)

Genre:

Research Report > Strength High (0.46)
Research Report > Experimental Study (0.46)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)

Add feedback

Separate Training for Conditional Random Fields Using Co-occurrence Rate Factorization

Zhu, Zhemin, Hiemstra, Djoerd, Apers, Peter, Wombacher, Andreas

arXiv.org Artificial IntelligenceDec-4-2012

The standard training method of Conditional Random Fields (CRFs) is very slow for large-scale applications. As an alternative, piecewise training divides the full graph into pieces, trains them independently, and combines the learned weights at test time. In this paper, we present \emph{separate} training for undirected models based on the novel Co-occurrence Rate Factorization (CR-F). Separate training is a local training method. In contrast to MEMMs, separate training is unaffected by the label bias problem. Experiments show that separate training (i) is unaffected by the label bias problem; (ii) reduces the training time from weeks to seconds; and (iii) obtains competitive results to the standard and piecewise training on linear-chain CRFs.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

1008.1566

Country:

North America > United States > Virginia > Arlington County > Arlington (0.04)
North America > United States > Rhode Island > Providence County > Providence (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Netherlands (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.47)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.47)

Add feedback

Filters

Collaborating Authors

separate training

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

4b296975eaa172a94e03538094da6a66-Paper-Conference.pdf

Slicing Vision Transformer for Flexible Inference

Asymmetric Conflict and Synergy in Post-training for LLM-based Multilingual Machine Translation

Slicing Vision Transformer for Flexible Inference

Employing Layerwised Unsupervised Learning to Lessen Data and Loss Requirements in Forward-Forward Algorithms

Austin city agency offers racially segregated 'anti-racist' trainings for 'white folks' and 'people of color'

High Dimensional Causal Inference with Variational Backdoor Adjustment

Separate Training for Conditional Random Fields Using Co-occurrence Rate Factorization